UCL CDS Symposium on Data Science in Public Health
Department of Statistical Science, UCL
Resources
Slides and code here: github.com/n8thangreen/data-science-in-health-talk
Project title: Assessed the factors that determine health literacy and the size of their influence/impact for Newham
Health literacy is broadly defined as the ability to access, understand, appraise, and communicate health information, enabling individuals to engage in healthcare and maintain good health throughout their lives.
| Small Area Estimation (SAE) | HTA / Statistics Method | |
|---|---|---|
| Weighted Logistic Regression with Synthetic Estimation |
\(\longrightarrow\) | Multilevel Regression with Post-stratification (MRP) |
| Linear Plug-In Model (Equivalent to Regression-Synthetic Estimator at Unit Level) |
\(\longrightarrow\) | Simulated Treatment Comparison (STC) |
| Residual-adjusted synthetic estimation | \(\longrightarrow\) | Targeted Maximum Likelihood Estimation (TMLE) (in causal inference) |
Newham Residents Survey 2023 (NRS)
Skills for Life (SfL) Survey 2011
Additional data
The predicted probability defined as: \[ \hat{\pi}_i = \text{logit}^{-1} \left( \hat{\beta}_0 + \sum_{x} \hat{\beta}^{x}_{\gamma_x[i]} \right) \]
The health literacy probabilities for each demographic category (cell \(c\)) are weighted by their proportion in the actual Newham population
11 covariates is 13,824 cells
Post-stratified estimate is: \[ \hat{\pi}^{\text{mrp}} = \sum_{c = 1}^{|\mathcal{S}|} w_c \hat{\pi}_{c} \]
\(\mathcal{S}\) is the set of all covariate combinations
\(N_c\) is the population frequency for cell \(c\)
\(N\) is the total population size
\(w_c = N_{c} / N\) are the combination weights
Adopt Surface Under the Cumulative Ranking Curve (SUCRA)
Percentage of the maximum possible cumulative rank an intervention can achieve
Providing a single value where a higher SUCRA indicates a better overall rank relative to others \[ \text{SUCRA}_{ij} = \sum_{r=1}^{n-1} P_{ijr} / (n-1), \]
where \(P_{ijr}\) is the cumulative probability for variable \(i\) at level \(j\) and rank \(r\)
The mean rank is \[ \mathbb{E}[\text{rank}(i,j)] = n - \sum_{r=1}^{n-1} P_{ijr}. \]
Concern was raised that the relationship between the covariates and the health literacy outcomes may have changed over time
So using SfL 2011 is not appropriate?
Suggested more recent survey data PIAAC 2023
Also fitted to SfL 2001 to see trend back in time
Note
Figure from Hutcheon, Chiolero, and Hanley (2010)
Nathan Green | UCL | n.green@ucl.ac.uk